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Abstract 

Wc describe a new approximation algorithm for Max Cut. Our algorithm runs in 0{n 2 ) time, 
where n is the number of vertices, and achieves an approximation ratio of .531. On instances 
in which an optimal solution cuts a 1 — e fraction of edges, our algorithm finds a solution that 
cuts a 1 — 4 v / e + 8e — o(l) fraction of edges. 

Our main result is a variant of spectral partitioning, which can be implemented in nearly 
linear time. Given a graph in which the Max Cut optimum is a 1 — e fraction of edges, our 
spectral partitioning algorithm finds a set S of vertices and a bipartition L, R = S — L oi S such 
that at least a 1 — 0(y/e) fraction of the edges incident on S have one endpoint in L and one 
endpoint in R. (This can be seen as an analog of Checgcr's inequality for the smallest eigenvalue 
of the adjacency matrix of a graph.) Iterating this procedure yields the approximation results 
stated above. 

A different, more complicated, variant of spectral partitioning leads to an 0(n 3 ) time algo- 
rithm that cuts 1/2 + e - *^ 1 ' 6 ' fraction of edges in graphs in which the optimum is 1/2 + e. 

1 Introduction 

In the Max CUT problem, we are given an undirected graph with non-negative weights on the edges 
and we wish to find a partition of the vertices (a cut) which maximizes the weight of edges whose 
endpoints are on different sides of the partition (such edges are said to be cut by the partition). 
We refer to the cost of a solution as the fraction of weighted edges of the graph that are cut by the 
solution. 

It is easy, given any graph, to find a solution that cuts half of the edges, providing an approximation 
factor of 1/2 for the problem. The algorithm of Goemans and Williamson [GW95], based on a 
Semidefinite Programming (SDP) relaxation, has a performance ratio of .878 • • • on general graphs, 
and it finds a cut of cost 1 — 0(y/e) in graphs in which the optimum is 1 — e. Assuming the unique 
games conjecture, both results are best possible for polynomial time algorithms [Kho02, KKMO04, 
MOO05] (see also [OW08]). Arora and Kale [AK07] show that the Goemans- Williamson SDP 
relaxation can be near-optimally solved in nearly linear time in graphs of bounded degree (or more 
generally, in weighted graphs with bounded ratio between largest and smallest degree). We show in 
Appendix A.l that, using a reduction [TreOl], the Arora-Kale algorithm can be used to achieve the 
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approximation performance of the Goemans- Williamson algorithm on all graphs in nearly-linear 
time. 

A different rounding algorithm for the Goemans- Williamson relaxation, due to Charikar and Wirth 
[CW04], finds a solution that cuts at least a 1/2 + (e/ log 1/e) fraction of edges in graphs in which 
the optimum is 1/2 + e. This result too is tight, assuming the unique games conjecture [KO06]. 

No method other than SDP is known to yield an approximation better than 1/2 for Max Cut, 
and such approximation has been ruled out for large classes of Linear Programming Relaxations 
[dlVKM07, STT07]. 

A main source of difficulty in designing approximation algorithms for max cut is the lack of good 
upper bound techniques for the max cut optimum of general graph. Indeed, suppose that one is 
able to design and analyse a new polynomial-time algorithm for max cut achieving, say, a .51 
approximation ratio, and consider the behaviour of the algorithm when given a graph whose max 
cut optimum is .501. Then the algorithm will clearly output a cut of cost < .501, but then 
the computations performed by the algorithm, plus the proof of its approximation ratio, provide 
a certificate that the optimum cut in the given graph is < .501/. 51 < .983. The problem is 
that, except for semidefinite programming, we know of no technique that can provide, for every 
graph of max cut optimum < .501, a certificate that its optimum is < .99. Indeed, the results 
of [dlVKM07, STT07] show that large classes of Linear Programming relaxations of max cut are 
unable to distinguish such instances. 

It is possible, however, to develop a new approximation algorithm that uses semidefinite program- 
ming only in the analysis, by showing that if the algorithm outputs a cut of cost c, then there is a 
dual solution for the Goemans- Williamson SDP relaxation of cost at most c/.51, thus proving that 
the max cut optimum is at most c/.51 and that the algorithm has a performance ratio at least .51. 
Such primal-dual algorithms, which use a relaxation only in the analysis, have been derived for 
several problems based on Linear Programming relaxations, but unfortunately, as discussed above, 
linear programming relaxations are unlikely to be helpful in max cut approximation. As far as 
we know, the only examples of primal-dual approximation algorithms for combinatorial problems 
based on Semidefinite Programming are the algorithms for the sparsest cut problem described in 
[ARV04, KRV06, OSW08]. 

Our Results 

Our main result is a variant of the spectral partitioning algorithm with the following property: 
given a graph G = (V, E) in which the Max CUT optimum cost is 1 — e, it finds a set S and a 
partition of S into two disjoint sets of vertices L, R such that the number of edges with one endpoint 
in L and one endpoint in R is at least a 1 — 0(y/e) of the total number of edges incident 1 on S. 
More precisely, we show that the number of edges having both endpoints in L or both endpoints 
in R, plus half the number of edges having an endpoint in S and an endpoint in V — S is at most 
a l^fe + o(l) fraction of the edges incident on S. (See Theorem 1 and the subsequent discussion.) 
We will ignore the o(l) additive factors in the subsequent discussion in this section. 

To derive an approximation algorithm for Max CUT, given a graph we apply the partitioning 
algorithm and find sets L, R as above, we remove the vertices in L U R from the graph, recursively 

An edges is incident on a set S of vertices if at least one of the endpoints i,j belongs to S. 
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find a partition of the residual graph, and then put back the vertices of L on one side of the 
partition and vertices of R on the other side. This means that we cut all the edges that are cut in 
the recursive step, plus all the edges with one endpoint in L and one endpoint in R, plus at least 
half of the edges between S and V — S. The recursion is stopped when less than half of the edges 
incident on S are cut, in which in case we return a greedy partition of the residual graph. 

We present an analysis of the recursive procedure due to Moses Charikar, which improves an 
analysis of ours which appeared in a previous version of this paper. The following observation 
plays an important role in the analysis: at a generic step of the execution of the algorithm, if the 
optimal solution in the original graph is 1 — e, and the current residual graph holds a p fraction of 
the original edges, then we know that the optimum in the current residual graph is at least 1 — e/p, 
and the spectral algorithm cuts at least a 1 — 1\Jej p fraction of the edges incident onLUfi. When 
the recursion ends, it is because the spectral algorithm cuts less than half of the edges incident on 
L U R, and so the optimum of the residual graph at the end of the recursion must be less than 
15/16, meaning that the residual graph at the end of the recursion contains at most a 16e fraction 
of the edges of the original graph. Putting together this information, a calculation shows that the 
algorithm cuts at least a 1 — 4y / £ + 8£ fraction of edges of the graph. The ratio (1 — 4- v /e + 8e)/(l — e) 
is always at least .531. 

When applied to graphs in which the optimum is close to 1/2 (in fact, to any graph in which the 
optimum is smaller than 15/16), our algorithm may simply return a greedy partition. Thus, it fails 
to provide any non-trivial approximation to the Max CutGain problem, which is the same as the 
Max Cut problem, except that we count the number of cut edges minus \E\/2. (Equivalently, we 
count the number of cut edges minus the number of uncut edges.) For Max CutGain we develop 
a more sophisticated spectral partitioning algorithm with the following property: given a graph 
in which the Max Cut optimum is 1/2 + e, our algorithm finds sets L,R such that the number 
of edges incident on L U R cut by the partition exceeds the number of uncut edges by at least a 
1 / exp(£l(l j e)) fraction of the edges incident on L U R. Iterating this algorithm allows us to find a 
cut for the entire graph of cost at least 1/2 + 1 / exp(Q(l / e)) . 

This second algorithm can be also applied to the case in which edges have negative weights, and 
it approximates a general class of quadratic programs. Given a symmetric real- valued matrix Q 
with zeroes on the diagonal, if there exists a vector x G {—1, 1}^ such that x T Qx > e • ||Q||i, our 
algorithm finds a vector y £ {—1,1}^ such that y T Qy > exp{—0(\/e)) ■ \\Q\\i, where ||Q||i := 
\ Q(hj)\- (The algorithm of Charikar and Wirth finds a vector y such that y T Qy > \\Q\\i • 
e/logl/e.) 

Relation to Cheeger's Inequality 

In the case of regular graphs, our main result, Theorem 1, may be seen as an analog of Cheeger's 
inequality [AI086] for the smallest (rather than second largest) eigenvalue of the adjacency matrix 
of the graph. We discuss this analogy in Section 5 

Relation to the Goemans- Williamson Relaxation 

Our algorithm may also be seen as a primal-dual algorithm that produces, along with a cut, a 
feasible solution to the semidefinite dual of the Goemans- Williamson relaxation such that the cost 
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of the cut is at least .531 times the cost of the dual solution. We describe this view in Section 6. 
Other Relations to Previous Work 

It has been known that one can use spectral methods to certify an upper bound to the Max CUT 
optimum of a given graph. In particular, if G is a <i-regular graph of adjacency matrix A, and 
M := ^ A has eigenvalues l = Ai>A2>--->A n , then one can easily show 2 that 

Max Cut < - + -|A re | (1) 
(Our Lemma 2 is essentially a restatement of this fact.) 

What is new is that we are able to prove a converse, in Lemma 3, and show that a non-trivial 
consequence follows whenever |A n | is close to 1. 

As mentioned above, it was known that A n = —1 if and only if G has a bipartite connected 
component. In particular, if G is connected and not bipartite then A n > —1. Alon and Sudakov 
[ASOO] consider the question of how small, in such case, can the gap 1 — |A n | be. They show that, 
if G is connected and not bipartite, it has maximum degree d and diameter D, and A n is the 
smallest eigenvalue of the adjacency matrix A, then d — \X n \ > rgqrrj^ - The bound was improved 

to d — | A n | > jj^ by Cioaba [Cio07]. Our result implies the weaker bound d — |A„| > in a 
(i-regular graph. 

The "converse expander mixing lemma" of Bilu and Linial [BL06] has some similarity with our 
approach to Max CutGain. Bilu and Linial show that if G is a d-regular graph, A is the adjacency 
matrix, and Ai > • • • > A n are the eigenvalues of M := 4A, then if max{A2, |A re |} > e it follows 
that there are sets L, R such that the number of edges between L and R differs from what one 
would expect in a random d-regular graph by a multiplicative error factor 0(e/logl/e). In our 
main result for Max CutGain (Theorem 11) we have a stronger assumption, that \X n \ > e, but we 
need to derive a much stronger conclusion, namely that the number of edges between L and R not 
only exceeds the number of edges that one would expect in a random d-regular graph (a fact that 
can be probably proved with the same quantitative result of Bilu-Linial) , but in fact exceeds the 
number of edges which are entirely contained in L or entirely contained in R. 

The main difference between our proof and the proof of Bilu and Linial is that the combina- 
torial quantity that they relate to max{A2, |A n |} is the optimum of the normalized multilinear 
form max,, o n |x T My|/(||a;|| • ||y||), for a certain matrix M, while the combinatorial quan- 
tity that we wish to relate to \X n \ is the optimum of the normalized homogeneous quadratic form 
maXjgr i 0,1} |x T Mx|/||a;|| 2 , for a different matrix M. Generally, it is considerably harder to round 
continuous relaxations of quadratic forms of the latter type compared to multilinear forms of the 
first kind. (See e.g. the introduction of [CW04] and their discussion of their results versus the 
results of Alon and Naor [AN06].) 

inequality (1) appears to be a folklore result. Lovasz [Lov03, Proposition 4.4] credits it to Delorme and Poljack 
[DP93a, DP93b]. The earliest related reference we are aware of is [Hae79, Theorem 2.1.4.i], which states that if Vi, V2 
is a partition of a d-regular graph G — (V,E), and if d\ is the average degree of the subgraph induced by Vi, then 
nid — ndi < —X n -(n — m), from which one can derive that m ■ (d — di), the number of edges crossing the cut, obeys 
m • (d — di) < ni • (n — m) • (d — X n )/n, and the latter term is at most n ■ (d— X n )/i. 
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The idea of iteratively removing parts of an instance in which one has a good solution appears in 
various works on the sparsest cut problem (for example in the way Spielman and Teng [ST04] find a 
balanced separator using their "nibble" procedure) , and it was used to approximate the Max CUT 
problem (in the version in which one wants to minimize the number of uncut vertices) by Agarwal 
et al. [ACMM05]. In the algorithm of Agarwal et al., as in our algorithm, the basic procedure that 
is being iterated finds a set S of vertices and a bipartition L, R of S such that most of the edges 
incident on S have one endpoint in L and one endpoint in R. 



2 Sparsification 

It follows from the Chernoff Bound that if we are given a graph G = (V, E) and we sample 
0(<5 _2 |y|) edges with replacement 3 then, with high probability, every cut (S,S) has the same cost 
in the original graph as in the new graph, up to an additive error J. 4 

For this reason, all the dependency on \E\ in the running time of our algorithm can be changed to 
a dependency on \V\ with an arbitrarily small loss in the approximation factor. 



3 The Spectral Algorithm 

In this section we prove our main result. 

Theorem 1 (Main) There is an algorithm that, given a graph G = (V, E) for which the optimum 
of the Max CUT problem is at least 1 — e, and a parameter 5, finds a vector y £ {—1,0, 1} V such 
that 

where Ai j is the weight of edge (i,j) and di is the (weighted) degree of vertex i. 

The algorithm can be implemented in nearly-linear randomized time 0(5~ 2 • (|V| + \E\) • log | V|). 

To understand the statement of Theorem 1, let y be the vector returned by the algorithm, and 
call L the set of vertices with negative coordinates in y, and R the set of vertices with positive 
coordinates. Then, up to constant factors, the numerator counts the number of edges incident on 
L U R which fail to have one endpoint in L and one endpoint in R, the denominator counts the 
number of incident incident on S. More specifically, the numerator counts four times the edges that 
are entirely contained in L or entirely contained in R, and twice the edges that have one endpoint 
in S and one endpoint in V — S. The denominator counts every edge incident on L U R once or 
twice, depending on whether one or both the endpoints of the edge are in S. 



3 If the graph is unweighted, we sample from the uniform distribution over the edges; otherwise we sample from 
the distribution in which each edge has a probability proportional to its weight. 

4 Note that the sparsified graph is an unweighted multigraph, and that the sparsification process is considerably 
simpler than the one used for algorithms for sparsest cut and other graph minimization problems. 



5 



The following form of the conclusion of Theorem 1 will be convenient in our analysis: given the 
vector y, call M the number of edges incident on L U R, U the number of "uncut" edges that have 
both endpoints in L or both endpoints in R, and X the number of "cross" edges that have exactly 
one endpoint in L U R; then 

u + \x< + -m 

Let A be the adjacency matrix of our input graph G (hence Ajj is the weight of the edge between 
i and j), and D be the diagonal matrix such that -D^j is the weighted degree di of vertex i and 
Aj = for i ^ j. 

Theorem 1 follows by combining the following two results, and noting that, for a, b > 0, y/a + b < 
v / a + Vb- 

Lemma 2 If the optimum Max CUT in G has cost at least 1 — e, there is a vector x G M. v such 
that 

x T (D + A)x < 2e-x T Dx . 

Furthermore, for every 5 > 0, we can find in time 0(5~ l ■ (\E\ + \V\) ■ log \ V\) a vector x G such 
that 

x T (D + A)x < (2e + 5) ■ x T Dx 

Lemma 3 Given a vector x G M. v such that x T (D + A)x < e ■ x 7 Dx, we can find in time 0(\E\ + 
\V\ log \V\) a vector y G {-1, 0, l} y such that 

T^AM + yj\ < ^ (2) 

Lemma 2 has a simple proof, and it can be seen as a statement about the semidefinite dual of 
the Goemans- Williamson relaxation, as discussed in Section 6. Lemma 3 is the main result of this 
paper. 

3.1 Proof of Lemma 2 

Consider the optimization problem 

X J\X 

mm (3) 

x£RV X 1 Dx 

Let (S, S) be an optimum cut for G, and define the vector x* G {—1, 1}^ such that x* = 1 if i G S 
and x* = — 1 otherwise. Then x* 7 Ax* equals twice the difference between the number of edges 
not cut by (S,S) and the number of edges that are cut, which is at most 2 • (2e — 1) • \E\. As for 
x* 7 Dx* , we have 

x* T Dx* = ^di- (x*) 2 = ^2d l = 2-\E\ 
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Thus x* is a feasible solution to (3) of cost at most 2e — 1, and if x is the optimal solution to (3), 
then we must have 



x T Ax < (2e - l)x T Dx 

To prove the "furthermore" part of the lemma, we observe that the optimization problem in (3) is 
equivalent to 

x T D- 1 / 2 AD~ 1 / 2 x 
min = (4) 

x&y x 1 x 

where D -1 / 2 is the matrix that such that D { ^ 2 = if Dij = 0, and D i j^ 2 = 1 / y/Dij otherwise. 
In turn, the optimization problem in (4) is the problem of computing the smallest eigenvalue of 
D-^AD' 1 / 2 , which is the same as computing the largest eigenvalue of the positive semidefinite 
matrix I - D~ 1 / 2 AD" 1 ' 2 . 

Given a n x n positive semidefinite matrix M with T non-zero entries and of largest eigenvalue 
Ai, and a parameter 8, it is possible to find a vector x such that x T Mx > Ai • (1 — 8) ■ x T x in 
randomized time 0(5~ 1 • (T + n) ■ logn) [KW92]. Applying the algorithm to I - D^ 1 / 2 ' AD^ 1 / 2 , 
which, as proved above, has a largest eigenvalue which is at least 2 — 2e, and which has |.E| + \V\ 
non-zero entries, we find in randomized time 0(5~ 1 ^ 2 ■ (\E\ + |V|) • log \V\) a vector x' such that 



x-il-D-^AD-V^x' 



iT i 

> 2 - 2e - 8 



x ,T x' 
and so 

x >t D -V2 AD -i/2 x > < (fe + S - 1) • x' T x> 
and, if we define x" := x'D 1 / 2 , then 

x" T Ax" <(2e + 8- l)x" T Dx" 

which we can rewrite 

x" T (A + D)x" < (2e + 8)x" T Dx" 

3.2 Proof of Lemma 3 

We now come to our main result. 

The condition x(D + A)x < e ■ xDx is equivalent to 



-^Aijixi + Xj) 2 < e^dixj (5) 



2 



Before starting the formal proof, we describe a heuristic argument that gives some intuition for the 
actual proof. 



7 



Proof Idea. Equation (5) states that the average value of (xj + Xj) 2 , for an edge is at most e 

times the average value of xj and x 2 . So, non-rigorously, we would guess that for a typical 
edges the value of \xi + Xj\ is at most about ^fe times \x{ \ + |scj|. For this to happen, it must 
be the case that X{ and Xj have different signs, and their absolute value is nearly the same; 
that is, for some positive c, X{ = — c and Xj = c(l — y/s). Suppose now that we pick a random 
threshold t, and we define Ui = —1 <^ Xi < —t and y% = 1 44> xi > t. Then \yi — yj\ is 2 with 
probability C\fe and zero otherwise, while \y%\ and \yj\ are 1 with probability roughly c and 
zero otherwise; then it follows that the expectation of j) \Vi + Vj\ * s about a yfe fraction 
of the expectation of ^ 



Our algorithm, which we call the 2-Thresholds Spectral Cut algorithm and abbreviate 2TSC, is as 
follows: 

• Algorithm 2TSC 

• For every vertex k 

— Define the vector y k G {—1, 0, 1}^ as follows: 

y\ = -1 iff Xi < -\x k \ 
y\ = 1 iff Xi > \x k \ 
yf = iff \xi\ < \x k \ 

• Output the vector y k for which the ratio 



is smallest 



The algorithm can be implemented to run in 0(|£/| + \V\ log \V\) time. We first sort the vertices 
according to the value of \xi\, and so we assume we have \x±\ < \x%\ < • • • < \x n \ when we run 
2TSC. At each step k, we need to modify the vector y only in positions k and k — 1, and the cost 
of recomputing the ration is only 0(d k + df~—i), so that all the n steps together take time 0(|i?|). 

We need to argue that, under the assumption of the Lemma, the algorithm outputs a vector y such 
that the ratio in (2) is at most y/8e 

In order to analyze 2TSC, we study the following randomized process: 

• Pick a value t uniformly in [0, maxj x 2 ]; 

• Define Y G {-1,0, 1} V as follows: 

Yi = -1 iff Xi < -Vt 

Yi = l iff Xi > Vi 

Yi = iff |a*|<v^ 
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Every Y that is generated by the probabilistic process with positive probability is considered 
by algorithm 2TSC at some stage; this implies that if algorithm 2TSC outputs a vector y 
such that j Ai t j\y(i) + y(j)\ > V8eJ2idi\yi\, then in the randomized process we must have 
Y^i j -,Aij\Y(i) +Y(j)\ > \/8e J2 { di\Yi\ with probability 1 and, in particular, EYli j Aij\Y(i) + 
Y(j)\ > v^iEE^I^I- 

We shall prove that E J2 t . ■ Aij\Y{i) + Y(j)\ < V8e'E'^2 i di\Yi\ and so we shall conclude that the 
output of algorithm 2TSC satisfies the Claim. 

Since Equation (5) and the distribution Y are invariant under multiplying x by a scalar, we may 
assume that maxj \xi\ = 1, so that t is chosen uniformly in [0, 1]. 

A case analysis shows that, for every edge 

E\Yi + Yj\ < \xi + Xj \ ■ (\xi\ + \xj\) (6) 

To verify Equation (6) we need to distinguish the case in which Xi and Xj have different signs from 
the case in which they have the same sign. We assume without loss of generality that \xi\ > \xj\. 

• If they have different signs, and, say, \xi\ > \xj\, then \Yi + Yj\ = 1 when \xj\ 2 < t < \xi\ 2 , 
and zero otherwise. Indeed, if t < \xj\ 2 , then Yi = —Yj and \Yi + Yj\ =0, and if t > \xi\ 2 then 
Yt = Yj = 0. 

So E \ Yi + Yj\ equals \xi\ 2 — \xj\ 2 , which is equal to the right-hand side of Equation (6). 

• If they have the same sign, then \Yi + Yj\ =2 when t < |xj| 2 , \Yi + YA = 1 when \xj\ 2 < t < 
\xi\ 2 , and \Yi + Yj\ = when t > \xi\ 2 . 

Overall, E \Yj, +Yj\ equals 2x 2 + (xf — x 2 ) = x 2 + x 2 . The right-hand-sise of Equation (6) is 
(xi + Xj) 2 , which is only larger. 

Note also that E \Yi\ = xf. 

To complete our argument it remains to apply Cauchy-Schwarz and standard manipulations. 

i,3 



By our assumption, 

^ ^ A-i^j \ xj -\- xj I ^ 2g ^ ^ djXj 
i,j i 

and it is a standard calculation that 

J2 A ij(\ x i\ + \ x i\) 2 < 2 5^Aj(l*t| 2 + \ X J\ 2 ) = 4 ^2 d i x l 



< 2^ Aij I Xi + Xj | • (\xi\ + | Xj | ) 



h3 



— I 'y ] Aij\xi + Xj\^ • j 'y y Aij(\x{\ + \xj \)' 



h3 



i.J 
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and so 



E^AijYi + Yjl < s/Ze^dixf = V8eE^2d l \Y i \ 

i,j i i 

This completes the proof that Algorithm 2TSC performs as required by the Lemma. 

4 Approximation for Max Cut 

In this section we analyze the following algorithm 

• Algorithm: Recursive-Spectral-Cut 

• Input: graph G = (V,E), accuracy parameter S 

• Run the algorithm of Theorem 1 with accuracy parameter 5, and let y £ {—1,0, 1} be the 
solution found by the algorithm; call M the weighted number of edges such that least 
one of i/i or yj is non-zero, C the weighted number of cut edges such that jji,yj are both 
non-zero and have opposite signs, and X the weighted number of cross edges such that 
exactly one of yi,yj is zero; 

• If C + \X < |M, then find a partition of V that cuts > \E\/2 edges, and return it. 

• If C + \X > ±M, then let L := {i : Vi = -1}, R := {% : y t = 1}, V := {i : y { = 0}, let 
G' = (V',E') be the graph induced by V, recursively call Recursive-Spectral-Cut on G', 
and let Vi, Vi be the partion found by the algorithm; return (ViUL, V^LiR) or (ViUR, V2UL), 
whichever is better. 

Note that the algorithm runs in randomized time 0(5~ 2 ■ \V\ • (\V\ + \E\) ■ log \V\) because each 
iteration takes time 0(5~ 1 ■ (\V\ + \E\) ■ log \ V\) and there are at most |V| iterations. 

In a preliminary version of this paper we presented a simple argument showing that if opt > 1 — e, 
then the algorithm cuts at least 1 — 0(e 1//3 ) — 5 fraction of edges. The following tighter argument 
is due to Moses Charikar (personal communication, July 2008). 

Theorem 4 If Algorithm Recursive-Spectral-Cut receives in input a graph G = (V,E) whose 
optimum is 1 — e, with e < 1/16 then it finds a solution that cuts at least a 1 — 4y / e + 8e — | fraction 
of edges. 

Proof: Consider the t-th. iteration of the algorithm, and let Gt be the residual graph at that 
iteration, and let pt ■ \E\ be the number of edges of Gt- Then we observe that the Max Cut 
optimum in Gt is at least 1 — e/pt- 

Let St be the set of vertices and L t ,Rt the partition found by the algorithm of Theorem 1. Let 
Gt+\ be the residual graph at the following step, and pt+i • \E\ the number of edges of Gt+i- (If the 
algorithm stops at the t-th iteration, we shall take Gt+i to be the empty graph; if the algorithm 
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discards Lt,Rt and chooses a greedy cut, we shall take Gt+i to be empty and Lt,Rt to be the 
partition given by the greedy cut.) 

We know by Theorem 1 that the algorithm will cut at least a 1 — 2^/e/pt — 5/2 fraction of the 
l-^l ' (pt — Pt+l) edges incident on St- 

Indeed, we know that at least a max{l/2, 1 — 2\Je/ pt — 5/2} fraction of those edges are cut (for 
small value of pt, it is possible that 1 — 2y / e/pt + 5/2 < 1/2, but the algorithm is always guaranteed 
to cut at least half of the edges incident on St). This means that any convex combination of 1/2 
and 1 — 2y/e/ pt — 5/2 is still a lower bound on the fraction of edges incident on St cut by the 
algorithm. 

If both pt and pt+\ are at least 16e, we are going to use the lower bound 



\E\ ■ (Pt ~ Pt+i) -1-2 



~T_5_ 
~Pt~ 2 



If Pt > 16e > Pt+i, then we use the lower bound 



Pi 



T 5 



\E\ I 1 - 2, / )dr 

ipt+\ V V Pt 2 , 



> \E\ 



pt 



pt+i 



w . ta _ 16E) .( 1 _ 2 y£ + |) + i iS |. (16E _, (+1 ).i>i iS i£( 1 _ 2 yf_|) dr+ i £ |.£i 

Finally, if both p t and pt+i are smaller than 16e, we use the lower bound 

1 f Pt 1 

\E\ • (p t - pt+i) ■ - = \E\- / -dr 

L J pt+i L 

Summing those bounds, we have that the number of edges cut by the algorithm is at least 

l£l ■ (L [ l - 2 ^- 9 * + C H - |E| ■ (i - «vs + * - a - iS 4) 



dr 



□ 



Corollary 5 Algorithm RECURSIVE-SPECTRAL-CUT is a .531128 — 5 approximate algorithm for 
Max Cut. 

Proof: Write opt = 1 — e. If e > 1/16 then the algorithm finds a solution of cost > 1/2 and the 
approximation ratio is 16/30 > 5.33333. 

If 1/16 < e < 1/2, then the algorithm finds a solution of cost at least 1 — Ayfe + 8e — 5/2, and the 
approximation ratio is at least 

l-Ve + 8e-<5/2 ^ I - A^/e + 8e . 

> o 

1-e ~ l-e 
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If we call p(e) := 1 ~ 4 1 v ^ +8e , then some calculus shows that, for 1/16 < e < 1/2, p(e) is minimized 
at .05496 (the smallest root of — 2x 2 + 9x — 2 = 0) and is always at least .531128 • • • . □ 



5 Relation to Cheeger's Inequality 

In this section we compare our main result, Theorem 1, with Cheeger's inequality [AI086]. We 
restrict our discussion to the case of regular graph. 

If G is a (i-regular graph, A is its adjacency matrix, and M := -jA, then M has n eigenvalues, 
counting multiplicities, which we shall call Ai > A2 > • • • > A n . It is always the case that Ai = 1, 
and that |Aj| < 1 for every i. The extremal cases are captured by the following well-known facts: 

1. A2 = 1 if and only if G is disconnected, that is, if and only if there is a set S, \S\ < |V|/2, 
such that no edge of G leaves S. 

2. A n = — 1 if and only if G contains a bipartite connected component, that is, if and only if 
there is a set S and partition of S into disjoint sets L, R, such that all edges incident on S 
have one endpoint in L and one endpoint in R. 

Cheeger's inequality characterizes the cases in which A2 is close to 1 as those in which there is a 
set S, \S\ < \V\/2 such that the number of edges between S and V — S is small compared to d\S\. 

If we define h(G) to be the edge expansion of G, 

. edges(S,V - S) 
h(G) = mm — 

SCV: |5|<|V|/2 d\S\ 

then we have Cheeger's inequality 



^2-(l-A 2 ) > h(g) > - • (1 - A 2 ) (7) 

Similarly, Lemmas 2 and 3 characterizes the cases in which A n is close to —1 as those in which 
there is a set S and a partition (L, R) of S such that the number of edges incident on S which fail 
to be cut by the partition is small compared to d\S\. 

Define the bipartiteness ratio number of a graph to be 

p(G) := mm 



y e{-i,o,i} v 2dJ2i\Vi\ 
which is equivalent to 

. 2edges(L) + 2edges(R) + edges(S, V — S) 

SCV, (L,R) partition of S d\S\ 
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then we have 



x/2 • (1 - |A„|) > 0(G) > - ■ (1 - |A n |) (8) 

There are examples in which both inequalities in (8) are tight within constant factors. 

If we take an odd cycle with n vertices, then f3{G) > -, because for every subset S of vertices and 
for every bipartition of S there is at least one failed edge, and the number of edges incident on S 
is at most n. In an odd cycle, however, d = 2 and |A n | = 2 — 0(l/n 2 ), and so j3 is as large as 
£1(^1 -\\ n \). 

To see the tightness of the other inequality, start from a £>regular expander such that, say, 
max{A2,|A n |} < 1/2. (Such graphs exist for constant k.) Then construct G by taking the dis- 
joint union of the edges of G and the edges of a k ■ (1 — e)/e-regular bipartite graph, so that the 
resulting graph is d-regular with d := k/e. There is a cut that cuts all the edges of the bipartite 
graph, so (3(G) < e, but the smallest eigenvalue of M is at least —1 + k/2d > — 1 + e/2, meaning 
that p is 0(1 - |A„(G)|). 

Our results, as stated in (8), are not just syntactically similar to Cheeger's inequality: There are 
also similarities between the proof of Cheeger's inequality and of Theorem 1. The analysis in 
Cheeger's inequality relies on the study of the quadratic form 

J2A(i,j)-( Xi - Xj ) 2 (9) 

and it is based on the intuition that if (9) is small compared to ^ xf then for most edges (i, j) we 
have Xi « Xj. 

Our analysis was based on the study of the quadratic form 

Y^A(i,j)-{ Xl + x,f (10) 

and the intuition that if (10) is small compared to then for most edges we have 

6 Relation to the Goemans- Williamson Relaxation 

The dual of the Goemans- Williamson relaxation is 

mm l^l -\YjiVi 

subject to (11) 
D + A- diag(yi, ...,y n )hO 

We can see Lemma 2 as stating a special case of the weak duality fact that the cost of every feasible 
solution to (11) is an upper bound to the optimal cut in the graph. 

Indeed, if the optimal cut is of size > \E\ ■ (1 — e), then no solution of cost < \E\ ■ (1 — e) can be 
feasible for (11). In particular, the solution j/j = 2edi has cost 1 — e and cannot be feasible, meaning 
that D(l — 2e) + A cannot be feasible, and there is a vector x such that x(D(l — 2e) + A)x < 0. 
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In turn, Lemma 3 has the following primal dual interpretation: given a graph G, there is an e such 
that algorithm 2TSC finds L, R such that C + \X > (1 - 2^/e-8/2)M, and the solution y { := 2ed { 
is feasible for (11), thus showing that the Max Cut optimum is at most 1 — e. 

Given this premise, we can now view algorithm Recursive-Spectral-Cut as a primal-dual al- 
gorithm. 

At step t of the recursion, let pt\E\ be the number of edges in the residual graph Gt, and C% and 
X t be the number of cut and cross edges in the solution L t ,Rt found by the algorithm. Define et 
so that 1 — et/pt is the upper bound on the Max Cut of Gt given by the dual solution associated 
to the algorithm as above, and the algorithm satisfies Ct + \Xt > (1 — 2\Jet/ pt — 8/2)Mt- Then 
the dual solution at time t also proves an upper bound 1 — et to the Max Cut optimum of G. Let 
e := max(£(; then we have (i) a dual solution proving that the Max Cut of G is < 1 — e, and we 
know that (ii) at every step t we have Ct + hXt > (1 — 2y/e/pt — 5/2)M t . From fact (ii) and the 
analysis done in the proof of Theorem 1 we see the algorithm outputs a solution that cuts at least 
a 1 — 4-y/e + 8e — 8/2 fraction of edges, and it is able to output a feasible dual solution to the GW 
relaxation proving a 1 — e upper bound to the optimum. 

In particular, the ratio between the cost of the solution found by the algorithm and the upper 
bound provided by the dual solution is always at least .531. 



7 Quadratic Programming and the Max CutGain Problem 

Let A be the adjacency matrix of a weighted graph with no self-loops, possibly with negative 
weights, let dj := ^ ■ \A^j\ be the weighted degree of node i, and D := diag(di, . . . , d n ). Max-Cut 
Gain is the optimization problem 



y 1 Ay 

max =— — (12) 

ye{-i,i} v V T Dy 

In words, Max Cut Gain is the maximum, over all cuts, of the difference between the number of cut 
edges and the number of edges that are not cut, divided by the total number of edges. Equivalently, 
the optimum of Max Cut Gain is e if and only if the optimum of Max Cut is \ + he. (The name of 
the problem comes from the fact that one is measuring how much one gains by using an optimum 
cut compared to a random cut, which only cuts a 1/2 fraction of edges.) 

Note that, up to the scaling that we do by dividing by y T Dy = we are considering the 

problem 



max y Qy (13) 

ye{-hi} v 

where Q is an arbitrary symmetric matrix with zeroes on the diagonal. Apart from the restriction 
to symmetric matrices, this is the same family of quadratic programs studied by Charikar and 
Wirth [CW04]. It helps intuition, however, to continue to think about A = —Q as the adjacency 
matrix of a weighted undirected graph. 
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We define the gain ratio of a graph the quantity 



7 (G) := max (14) 

y6{-i,o,i} v y TD y 

In the gain ratio, we consider all subsets S C V of vertices, and all partitions (L, R = S — L) of 
the set 5; the objective function is the ratio between twice the difference of cut edges minus uncut 
edges among the edges induced by S, divided by the volume of S. If one imposed the additional 
constraint S = V, then one would recover the Max Cut Gain problem. 

Let A n be the smallest eigenvalue of the matrix M := D~ 1 ^ 2 AD~ 1 ^ 2 ; then we see that 

7(G) < |A n | (15) 

because 

z T Mz x 1 Ax y T Ay 

AJ = — mm — = — = max „ > max ^— — = 7(G) 

z m v z 1 z x&y x 1 Dx ^£{-1,0,1}^ y 1 Dy 

we conjecture that 

7(G) > n ( J^l- ) ( 16 ) 

but we are only able to prove the considerably weaker result that 7(G) > e~°Q'>* n >'. 

We use the following approach. Let x G M. v be a real vector, and Y be a distribution over discrete 
vectors {—1,0,1}^. We say that Y is a (ci, C2, (^)-good (randomized) rounding of x if 

1. \d ■ E YjYj - XiXj\ < 5 ■ (xf + x 2 ) 

2. E\Yi\< c 2 x 2 

We have the following simple fact: 

Claim 6 If x is a vector such that — x T Ax > e • x 1 Dx, and Y is a a (c\, C2,S)-good rounding of x, 
then the support ofY contains a vector y G { — 1,0, 1}^ such that 

-y T Ay >—{ £ -28)- y T Dy 



c±c 2 



Proof: We have 



> 



hi 
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> — ■ (e-26)E^2di\Yi\ 



cic 2 
and so 

and in particular there must exist a vector y G { — 1,0, 1} such that 

%Z^> J- (£ - 2<y) 

□ 



Lemma 7 (Main) For every x G M 17 and ewer?/ £ > 1 i/iere is a (ci,C2, 1/tj-good rounding of x 
such that c\ • c 2 < £ _1 • e . 

Proof: Given x, we assume without loss of generality that < 1 for every i, and we consider 
the following distribution Y: 

• Pick a threshold t G [0, 1] so that t 2 is uniformly distributed in [0, 1]; 

• For every vertex i, pairwise independently: 

— If | x i | > t or \xi\ < t ■ e~ e , then set Yi := 0; 

— If i - e~ e < |xj| < t, then set Yj := sign(xi) with probability |a?i|/t, and := with 
probability 1 — \xi\/t. 

We begin with the calculation of the expectations E|Yj|. 

Claim 8 E\Yi\ = 2 • (e i - 1) ■ x\ 

Proof: [Of Claim ] The threshold t is chosen according to a distribution whose density function 
is It for t G [0, 1]; conditioned on a specific choice of t, the expectation of |Yj| is if \xi\ > t or 
< te - ^, and it is \xi\/t otherwise. Hence, we have 

[■\xi\e 1 I I f\xi\e l 

E\Yi\= 2t- —dt = / 2\xi\dt = 2 ■ (e e - 1) • xj 

J\Xi\ t J\xi\ 

□ 

Claim 7 tells us that we can take c 2 = 2 • (e — 1) < 2e^. The following two claims give us that we 
can take c\ = l/2£, so that cic 2 < j • e as required. 
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Claim 9 If \xi\ > e \xj\, then, for every c 

1 2 

\cEYiYj - XiXj\ < -x\ 

Proof: [Of Claim 9] Just note that, under the assumption of the claim, EYiYj = 0, and \xiXj\ < 
e~ e x 2 < i- x x\. □ 

Claim 10 If\xA < \xi\ < e \xj\, then 



21 ' ^ XiXj 



<-„-xi 



Proof: [Of Claim 10] Consider the expectation of YjYj, i ^ j, conditioned on a fixed choice of t. 

YiYj = whenever \xi\ > t or \xj\ < te~ f . If t is such that \x{\ < t < \xj\e e , then the conditional 
expectation of YiYj IS x^x jit 2 . Overall, we have 



EYjYj 



\xj\e 



X^ X j 



2t- 



, dt — ^x^x j 

t 2 



\xj\e* ^ 



So we have 



21 ' ^ XiXj 



dt — —i- l t X i ' 

t 3 



1 . \Xi\ 



hi 



I 'In 



■t 3j j \ \X 7 



where the last inequality follows from the fact that pin - < 1 for every < p < 1. □ 
The lemma now follows. □ 

In order to make the proof constructive, we need to show that we can find a vector y in the sample 
space of Y as in the conclusion of the lemma. Suppose that the distribution of Y described above 
is such that -EY T AY > ESY T DY. 

A first observation is that there must be a threshold t* such that, conditioned on that particular 
choice of t, we still have - E[Y T AY\t = t*] > 5E[Y T DY\t = t*]. Once we find such a threshold, 
we can search in the sample space of Y\t = t* , which is of polynomial size. 

It remains to describe how to find a threshold t* as above. Let us say that two thresholds t%, t2 are 
combinatorially indistinguishable if the sets of vertices {i : 8t\ < \xi\ < t\} and {i : St2 < \xi\ < £2} 
are equal, and call S the set of vertices. 

Then we have 



E[Y T AT\t = t 1 ] _ J2i,jes Aij^i/ti _ 1 EijeS A v x i x i 



E[Y T DY\t = h 



EtGS^I^I/*! 
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and, similarly 

E[Y T AT\t = h] _ 1 Ei,jeS A ij x i x i 



E[Y T DY\t = t 2 }~ t 2 Ei.es d i\ x i\ 

so that it is always preferable to choose the smaller threshold. This means that for every equivalence 
class of combinatorially indistinguishable thresholds we only need to look at one of them, in order 
to find t* , and so we only need to consider at most 2\V\ thresholds. In particular, t* can be found in 
0(|-E| + \V\) time. A nearly pairwise independent sample space of size 0(| ) can be used instead 
of a perfectly pairwise independent one so that the whole algorithm takes time 0(|T^| + \E\), at the 
price of a o(l) additive loss in the approximation. 

The following theorem summarizes our progress so far. 

Theorem 11 There is a nearly quadratic time algorithm that in input a graph G = (V,E) such 
that 7(G) > e finds a set S and a partition (L,R) of S whose gain is at least e^^ 1 /^. 

Proof: We call the algorithm Four- Threshold Spectral Cut, or 4TSC. 

• Algorithm 4TSC 

• Input: Graph G = (V, E) 

— Let A be the adjacency matrix of G, D be the matrix of degrees, M := D -1 / 2 MD~ l l 2 . 
Find a vector x € MY such that e := —x T Mx/x T x < 2|A n ,|, where X n is the smallest 
eigenvalue of M. Set i = 10/e 

— For every threshold t in the set {x(i) : i £ V} U {e~ e x(i) : i G V} 

* Let Yi, . . . ,Y n be a distribution of sample space fit that is e/10-close to pairwise 
independence, and such that Yi = if \xi\ > t or \xj\ < e~ e t; and such that 
Yi = sign{xi) with probability \xi\/t otherwise. 

— Output the vector y in the union of fit that maximizes 

Using the construction of almost pairwise independent random variables of Alon et al. [AGHP92], 
each sample space fit has size O(logn), and can be computed in 0(n) time. For each vector y, the 
ratio can be computed in linear time. □ 

By iterating the algorithm we derive our main result of this section. 

Theorem 12 There is a nearly cubic time algorithm that in input a graph G = (V, E) such that 
max — cut — gain(G) > e finds a cut (L, R) of V of gain > e"^ 1 /^ 



8 Conclusions 

The motivating question for this work was to find a combinatorial interpretation of the quantity 
d — |A n | in a d-regular graph, akin to the interpretation of d — A2 provided by the theory of edge 
expansion. 
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In establishing such an interpretation (in terms of the quantity that we call "bipartiteness ratio" in 
Section 5) we proved that a natural and easy-to-implement spectral algorithm performs non-trivially 
well with respect to the Max Cut problem. 

The algorithm is very fast in practice [OT08]; using a termination rule that is slightly more relaxed 
than the one used in this paper (stopping when U + X > M/2, instead of U + X/2 > M/2), the 
algorithm makes at most one recursive call in all the experiments that we performed. It would be 
interesting to give a proof that this is always the case. 

A number of intersting open questions remain, such as: 

1. What is the worst-case approximation ratio of our algorithm? We believe that our bond .531 
is not tight. 

2. Is there a "purely combinatorial" algorithm (namely, one not involving numerical matrix 
computations) for Max Cut achieving an approximation factor better than 1/2? 

3. It should be possible to significantly improve our bounds for Max CutGain. 
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A Appendix 



A.l Efficiency of the Arora-Kale Algorithm 

Arora and Kale [AK07] describe an algorithm for the Goemans- Williamson SDP relaxation of Max 
Cut which achieves an approximation ratio 1 + o(l) and runs in time 0(D max ■ \ V\) given in input 
an unweighted multi-graph G = (V,E) of maximum degree -D max . 5 In particular, it is possible to 
find (a — o(l))-approximate solutions to Max Cut in time 0(D max ■ \V\), where a = .878 • • • is the 
approximation ratio of the Goemans- Williamson algorithm. 

In this section we show that, using the Arora-Kale algorithm and a reduction from [TreOl], it is 
possible to approximate Max Cut within a — o(l) in time 0(|V| + \E\) regardless of the degree 
distribution. 6 

Given the sparsification result discussed in Section 2, it is sufficient to prove the following theorem, 
which is implicit in [TreOl]. 

Theorem 13 There is a randomized algorithm C and a deterministic algorithm R with the follow- 
ing properties. 

Given a graph G = (V,E), algorithm C constructs in 0{\V\ + \E\) time a graph G' = (V',E') 
of maximum degree 0(1) with \V'\ = 2\E\ vertices, such that the following happens with high 
probability: (i) maxcut(G') > maxcut(G) — o(l), and (ii) given an arbitrary solution S' C V' of 
cost c in G' , algorithm R constructs in 0{\V\ + \E\) time a solution S C V of cost > c — o(l) for 
G. 

Proof: We sketch how the argument in [TreOl] applies to Max Cut. 

Define the weighted graph G = (V, E) as follows. (This graph will only be used in the analysis, 
and not explicitely constructed in the reduction.) For every vertex v £ V of degree d v , V contains 
d v copies of v; for every edge (u, v) in E, we have d u ■ d v edges (u, v) in E', one for every copy u of 
u and for every copy v of v, each such edge having weight l/(d u ■ d v ). 

We claim that approximating Max Cut in G is equivalent to approximating Max Cut in G. First, 
it should be clear that if (S, V — S) is a cut in G of cost c, then if we define S C V to be the set of 
all copies of vertices in S, then (S, V — S) is a cut of cost c in G. On the other hand, if (S, V — S) is 
a cut of cost c, then consider the distribution over cuts in G in which a vertex v is picked to be in 
S with probability proportional to the fraction of copies of v which are in S; the expected fraction 
of cut edges in G is exactly c, and using the method of conditional expectations we can find a cut 
of cost at least c in linear time. 

The graph G' is obtained by sampling with replacement 0( | ) = 0(|V| + \E\) edges from E, 
using the distribution in which an edge is sampled with probability proportional to its weight. As 
discussed in Section 2, it follows from Chernoff bounds that a solution of cost c in G' has cost 
c±o(l) in G. 

It remains to discuss the complexity of sampling G'\ to sample one edge, we first pick a random 
edge (u, v) of G, and then we pick at random one of the copies u of u and one of the copies ii 

5 The Arora-Kale result is more general, but this statement is sufficient for our purpose 

6 The running time can be reduced to <5(|V|) if the representation of the graph is such that a random edge can be 
sampled in O(l) time, and the degree of a given vertex can be found in O(l) time. 
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of v; this distribution is equivalent to randomly sampling one of the edges of G with probabiltiy 
proportional to its weight. After 0(|V| + \E\) time preprocessing, each edge of G' can be sampled 
in constant time. 

□ 



7 The point of this discussion is that G may have f2(|V| 2 ) edges even if \E\ = 0(|V|), for example if there are two 
vertices of degree \V\ — 1. This means that it is not possible to explicitly construct G in 0(|V| + \E\) time, and so 
one must sample edges from G without explicitly constructing G. 
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